Classification and prediction of drought and salinity stress tolerance in barley using GenPhenML

Genetic and agronomic advances consistently lead to an annual increase in global barley yield. Since abiotic stresses (physical environmental factors that negatively affect plant growth) reduce barley yield, it is necessary to predict barley resistance. Artificial intelligence and machine learning (ML) models are new and powerful tools for predicting product resilience. Considering the research gap in the use of molecular markers in predicting abiotic stresses, this paper introduces a new approach called GenPhenML that combines molecular markers and phenotypic traits to predict the resistance of barley genotypes to drought and salinity stresses by ML models. GenPhenML uses feature selection algorithms to determine the most important molecular markers. It then identifies the best model that predicts atmospheric resistance with lower MAE, RMSE, and higher R2. The results showed that GenPhenML with a neural network model predicted the salinity stress resistance score with MAE, RMSE and R2 values of 0.1206, 0.0308 and 0.9995, respectively. Also, the NN model predicted drought stress scores with MAE, RMSE and R2 values of 0.0727, 0.0105 and 0.9999, respectively. The GenPhenML approach was also used to classify barley genotypes as resistant and stress-sensitive. The results showed that the accuracy, accuracy and F1 score of the proposed approach for salinity and drought stress classification were higher than 97%.


Prediction of drought and salinity stress score
Randomly partitioning phenotype and genotype data into the train and test dataset, the GenPhenML selects the best performing ML model after training the RF, SVM, NN, GP and DT model and test them with separate data from the train data.In the prediction of salinity and drought stress score, ReliefF, MRMR and F-test FS algorithms were used to select the appropriate subset of phenotype and genotype features.Each ML model was trained using features selected by FS algorithms.The performance of all ML models was evaluated by MAE, RMSE and R 2 criteria.The model with the lowest MAE value over the test data set was selected as the best performed model.The results of salinity stress prediction using phenotype and genotype features are presented in Table 1.In this table, the performance of three FS algorithms and 5 ML models over the training and testing phases are presented.The obtained results showed that the trained models with phenotype and genotype features do not perform well in predicting the salinity and drought stress scores.
The performance of five ML models in predicting plant salinity stress using a combination of phenotype and genotype features is presented in Table 2.The results showed that the ReliefF algorithm and the NN model outperformed other models in the training and test phases.The MAE, RMSE and R 2 values obtained for the NN model in the training phase were 0.0764, 0.0073, and 0.9999, respectively.In the test phase, this model had MAE, RMSE and R 2 values equal to 0.1206, 0.0308 and 0.9995, respectively.As demonstrated in Table 2, ReliefF algorithm and the NN model performs best in predicting drought stress compared to other ML models.The results showed that the NN model had MAE, RMSE and R 2 values of 0.04, 0.01 and 0.99 over the training phase and 0.07, 0.01 and 0.99 over the testing phase, respectively.
Comparing real and predicted salinity stress scores as well as drought stress, the regression equation and R 2 of NN model over the train and test datasets are shown in Fig. 1.The training sample points are distributed near the perfect fit line ("actual stress scores = predicted stress scores").The R 2 values are above 0.98, indicating that the model can achieve high training effects.After the model training, the testing data set is used to verify and evaluate the model.As shown in this figure, by analyzing the correlation and error between the predicted stress scores and the actual stress scores of the test data set, it can be seen that the test sample points are also basically distributed in near the perfect fitted line ("actual stress scores = predicted stress scores").The prediction performance of the model indicates that the prediction performance of the NN models is all reaching high prediction accuracy.
Comparing real and predicted salinity stress scores as well as drought stress, the regression equation and R 2 of NN model over the train and test datasets are shown in Fig. 2. The training sample points are distributed near the perfect fit line ("actual stress scores = predicted stress scores").The R 2 values are above 0.98, indicating that the model can achieve high training effects.After the model training, the testing data set is used to verify and evaluate the model.As shown in this figure, by analyzing the correlation and error between the predicted stress scores and the actual stress scores of the test data set, it can be seen that the test sample points are also basically distributed in near the perfect fitted line ("actual stress scores = predicted stress scores").The prediction performance of the model indicates that the prediction performance of the NN models is all reaching high prediction accuracy.

Classification of salinity and drought stress tolerance
The results of salinity stress tolerance classification by applying phenotype features are presented in Table 3.Using the accuracy index over the test dataset to compare the FS algorithms and ML models, the results showed that the ReliefF algorithm and the KNN model possessed accuracy, precision, and F1 score equal to 0.95, 0.96, and 0.95 in the training phase, and 0.91, 0.95, and 0.91 in the test phase respectively, outperformed other models.The results of salinity stress classification based on the combination of phenotype and genotype features are presented in Table 3.By comparison, the MRMR algorithm and KNN model performed better than other algorithms in the training and test phases.The results showed that the accuracy, precision, and F1 score of the KNN model were 0.99, 0.98, and 0.99, respectively.During the testing phase, the model has accuracy, precision, and F1 score values of 0.98, 0.99, and 0.98, respectively.
Obtaining similar results, the ReliefF algorithm and the KNN model classified drought stress tolerance with accuracy, precision, and F1 score equal to 0.99, 0.99, and 0.99 in the training phase, and 0.89, 0.89, and 0.90 in the test phase respectively (Table 4).The table shows that the KNN model performs better in classifying drought stress than other ML models.The comparison of FS algorithms also shows that the ReliefF algorithm has better results than other FS algorithms.The results show that the accuracy, precision and F1 score values of the KNN model are 0.99, 0.99 and 0.99 in the training phase and 0.85, 0.86 and 0.85 in the testing phase.
Table 4 shows that the KNN model performs better than other ML models in classifying drought stress.A comparison of FS algorithms also shows that the ReliefF algorithm gives better results than other FS algorithms.The results showed that the values of accuracy, precision and F1 score of the KNN model are 0.99, 0.99 and 0.98 in the training phase and 0.97, 0.99 and 0.97 in the testing phase.We used the confusion matrix to demonstrate the performance details of ML models in classification of salinity and drought stress tolerance.A confusion matrix with multifaceted views is fundamental in evaluating classification performance.Confusion matrices were created for training and testing data sets.The data shown in the columns on the confusion matrix is related to the actual data and the data shown in the rows represents the classification results of the test data.The confusion matrixes of ML models in the classification of salinity stress during train and test stages are shown in Fig. 3.In this figure the confusion matrixes of KNN classifier and ReliefF, MRMR and Chi2 FS algorithms are presented.
The confusion matrixes of ML models in the classification of drought stress during train and test stages are shown in Fig. 4. In this figure the confusion matrixes of KNN classifier and ReliefF, MRMR and Chi2 FS algorithms are presented.
Selecting the KNN classifier as the best performing one in classification of salinity and drought stresses tolerance, the four basic ratio metrics including True Positive Rate (TPR), Positive Predicted Value (PPV), False Negative Rate (FNR) and False Discovery Rate (FDR) are shown in Table 5. Regarding TPR and PPV, during the test stage, the MRMR FS algorithm had the best performance in classification of salinity stress tolerance.Also, the ReliefF FS algorithm outperformed other algorithms in classification of drought stress tolerance.Considering FNR and FDR, the MRMR and ReliefF FS algorithms resulted in lowest classification error in classification of salinity and drought stresses tolerance respectively.

Discussion
During long-term exposure to drought stress, agricultural plants may be destroyed, or their production may be significantly reduced.Conversely, Soil salinity is an essential factor in reducing agricultural production.Carrying out agricultural operations to prevent salinization of fields, such as drainage, as well as planting perennial plants and low irrigation of fields, are a solution to deal with salinity.Drought and salinity stress tolerance were classified and predicted in this paper.The classification accuracy of salinity and drought stress was equal to 0.98 and 0.97, respectively.So, both stresses were classified with high accuracy.Achieving a high-accuracy model of stress classification will significantly help the lines resist drought and salt stress.In the prediction phase, the drought stress score was predicted better than the salinity score.However, the R 2 of both stress predictions was 0.99.Since determining the stress score based on the plant's appearance requires an expert's knowledge and experience, stress score predictors reduce the dependence on individual senses and make the scoring process more precise.This research shows the importance of using phenotype and genotype traits in stress tolerance modeling in barley lines.Using phenotype and genotype traits improved the ML model's performance compared to using  Feature selection algorithms reduce the time and cost required for phenotype and genotype measurements.The ReliefF algorithm performed better in classification and prediction schemes than other FS algorithms.Reli-efF is a filtering FS method inspired by instance-based learning.This algorithm is a well-known preprocessing method that can be used in many data mining problems.ReliefF effectively ranks features based on their quality.This algorithm can work on both nominal and numerical datasets.ReliefF estimates the degree of importance of features by calculating the difference between features.ReliefF can work on datasets with missing values and datasets with more than two categories of data.Instead of selecting one of the nearest neighbors done in the ML models' performance in the barley tolerance to drought and salinity classification showed that the KNN model gives much better results than other models.The KNN model is simple and cheap to implement, does not require a parameter estimation stage, is capable of nonlinear modeling, and is effective and works efficiently in dealing with many categories of data.This model can be one of the best options for multi-class classification due to its simplicity and lack of high complexity.
In the prediction section, the NN model outperformed other models.The NNs are examples of flexible regression approaches.However, they have fundamental differences from classical (parametric) techniques.No initial assumption regarding the model's shape is required in making the model.Solutions that provide for modeling complex nonlinear relationships are better than parametric models.They can deal with problems that include nonlinear relationships between variables.However, NNs cannot solve problems defined without uncertainty and are known as black box techniques.Uncertainty conditions often arise during the rapid development of new technologies, inaccurate and insufficient data, and the lack of confidence in the adequacy of defined independent variables.Two critical factors in adjusting and increasing or decreasing the error rate in the NN model are the number of hidden layers and units in each layer.The greater the number of hidden layers, the more flexible it is.Increased net shooting and accuracy Calculations increase; however, this number cannot be increased as much as desired because the problem may not converge to the correct answer.

Conclusion
In this study, we proposed GenPhenML, a new approach to predict the resistance of barley cultivars to abiotic stress (drought and salinity), using ML models by combining molecular markers and phenotypic data.By finding the main molecular markers and selecting the best model, GenPhenML successfully predicted the stress score and the NN model showed MAE of 0.1206 and 0.0727, RMSE of 0 0.0308 and 0.0105 and R 2 of 0.9995 and 0.99 for salinity and drought predictions, respectively.In addition, GenPhenML successfully classified barley cultivars into stress-tolerant and stress-sensitive categories with greater than 97% accuracy for both types of stress.These findings increase the potential of GenPhenML as a powerful tool for barley breeding programs to develop new varieties with stress tolerance and ultimately contribute to global food security.

Data preparation
The phenotype and genotype properties of barley were determined utilizing its agronomic characteristics under saline and drought conditions.For stress score prediction, 1236 data samples were collected from barley lines and divided randomly to train and test datasets, each including 70% and 30% of the whole data.For stress tolerance classification, 1128 data samples were divided randomly to train (70%) and test (30%) datasets.The genotype and phenotype features of barley lines were determined utilizing their agronomic characteristics under saline and drought conditions.In the greenhouse at Gonbad Kavous University, 103 lines of F8 families resulting from Badia and Kavir crossings were examined using a completely randomized design with three replications.Planting was done in 5-kg soil capacity pots, with seven seedlings per line.The population was developed to present the plant genetic materials under the Gonbad Kavous University's license.All the methods were performed in accordance with relevant guidelines and regulations.Table 6 shows some physical and chemical features of the soil.www.nature.com/scientificreports/Drought stress was applied during the reproductive stage, with a moisture content of 0.8 field capacity equal to 20% by weight moisture.Every other week, irrigation was performed, and the moisture level was lowered to 9% by weight moisture.The soil moisture level was modified by assessing the amount of moisture lost and compensating with water (20%).Salinity stress was applied during the reproductive stage by irrigation with a salt chloride source of 16 dS.m-1.Weekly assessments of the salinity of the saturated extract in pots demonstrated a weekly increase of up to 10-17 (dS.m-1).The saturated extract was created by pouring 150 g of potting soil into a plastic bucket, adding distilled water, mixing, and shining the top.For phenotyping measurements, 15 competing plants of each line were measured, and their average was considered in the analysis.Phenotype scores were measured according to the protocols recommended by Chang and Yoshida 14,15 .The measurement instructions are provided in Tables 7 and 8.
The genotyping analysis was performed using crude DNA preparation.In a 1.5 ml centrifuge tube labeled with a label, a single leaf was extracted and placed in ice for a while.The leaf sample was macerated using 400 μl of extraction buffer (50 mM Tris-HCl, pH 8.0, 2.5 mM EDTA, 300 mM NaCl, and 1% SDS).It was ground until the buffer turned green.After that, 400 μl of extraction buffer was added and mixed by pipetting.For 10 min, the contents were centrifuged at 12,000 g in a microcentrifuge.Nearly 400 μl of lysate was extracted with 400 μl chloroform.The top supernatant was transferred to another 1.5ml tube, where DNA precipitation was performed with absolute ethanol.We centrifuged the contents for three minutes at full speed and discarded the supernatants.We rinsed the pellets with 70% ethanol and dried the DNA before resuspending it in 50 μl TE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, pH 8.0).An aliquot of the solution was used for PCR analysis and the remaining solution was stored at -20°C.
For marker analysis, 365 SSR markers were properly spread over seven barley chromosomes 16 .Based on the polymorphic SSR primers, the DNA of each line was amplified using primers exhibiting polymorphism.The PCR was performed using a thermocycler (iCyclerBIORAD, USA) with template DNA 50 ng in 15 μl reaction mixture of primers 0.67 M, reaction buffer 10 μl, MgCl 2 2.5 mM, dNTPs 0.2 mM and Taq polymerase 0.5 U. PCR was performed at initial denaturation of 94°C for 5 min, 30 cycles of denaturation at 94°C for 1 min, annealing at 58°C for 1 min, elongation at 72°C for 1.5 min, and final extension at 72°C for 5 min then storage in a refrigerator at 4°C.Separation and visualization of the final product were performed with 6% polyacrylamide gel electrophoresis and stained silver.ISSR, iPBS, IRAP, SCoT and CAAT markers were employed for the parental investigation.When the band amplified in the first parent, scores of 1 and 3 were used for the presence and absence of the band, respectively.Scores of 2 and 4 were also utilized when the band was amplified in the second parent.

Phenotype and genotype features
Phenotype data includes 15 phenotype features obtained from each plant by direct measurements.Genotype features consisted of 719 molecular markers determined by genetic measurements.These genotype features were used for the prediction of salinity and drought stress.Three FS algorithms (ReliefF, MRMR and F-test) were deployed to determine important genotype features.

Feature selection
Over the past decades, data collection and storage advances have forced many sciences to face vast amounts of information.The FS algorithms reduce the dimensionality of the data by selecting appropriate subsets of the original features 17 This paper used ReliefF, MRMR, F-test and Chi2 algorithms to select the appropriate number of features to train ML models.   .As an evaluation filter algorithm, the ReliefF algorithm can detect feature dependencies.This algorithm uses the concept of nearest neighbors to obtain feature statistics.In addition, it retains the general advantages of filtering algorithms, such as high relative convergence speed and independence of the selected features from the induction algorithm.The diff function in the ReliefF algorithm calculates the difference in feature value A between two samples, I 1 and I 2 , where I 1 = R i (R i is the target) and I 2 is H or M, in weighted updates.Bump identifies the two closest neighbor instances of the target.One with the same class called Close Hit (H) and one with the opposite class called Close Miss (M).For discrete features, the diff function is defined as follows 19 Furthermore, for continuous features, diff is defined as: The performance of the MRMR algorithm is based on the performance of mutual information between two feature spaces, which increases as the probability of sharing two feature vectors increases.Mutual information between two variables, x and y, is obtained according to Eq. 3 based on the probability density function 20 .
In the maximum correlation method, FS requires (I) to have the highest value with class c.This trend shows the most significant dependence of feature x on class c.Maximum correlation is one of the optimal feature search methods, which is obtained by Eq. 4 based on the average value of all mutual information values between individual features x i and class c.
According to Eq. 4, the characteristics most dependent on the class are selected; However, this dependency between functions can be considerable.Therefore, the mutual information between features is obtained per Eq. 5 to reduce duplications.
To achieve the optimal property due to the minimum and maximum release ratio, the two equations, 4 and 5, are combined to obtain Eq. 6.
In this equation, m represents the number of elements selected from the feature set S, and x is the feature vector 20 .
The F-test is a statistical test that calculates the ratio of variances between the instances with the same target value called groups and within a group for a feature in one-way Analysis of Variance (ANOVA).It ranks features based on higher f-score values, indicating fewer distances within groups and more distances between groups.The f-score in this method is given by: 21 .where variance between groups is the variance between groups indicated by the target feature, and variance within a group is the sum of variances within each group.
The Chi2 FS algorithm was used for stress classification, with individual chi-square tests used to assess the independence of predictor variables from response variables.A small p-value indicates that a predictor variable depends on the response variable, making it an important feature 22 .

ML models
This Section presents a brief description of all deployed ML models.The ML models are introduced more conceptually than mathematically.The mathematical explanations of models can be found in textbooks 23,24 .

Gaussian process regression (GPR)
The GPR regression model is a nonparametric statistical method for determining the relationship between independent and dependent variables.It uses latent variables, an explicit basis function, and unknown data x j ,x i ∈S  25 .

Linear discriminant analysis (LDA)
The discriminant analysis (DA) classification introduced by R. Fisher is one of the simplest and easiest classifiers.
There are two types of DA classifiers: linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA).In LDA classification, the decision surface is linear, while in QDA, the decision boundary is nonlinear 26 .Discriminatory characteristics create decision boundaries to distinguish between different classes in different areas.Thus, the input space is divided into regions, each bounded by some decision boundaries.A classifier is represented by decision function c or discrimination, where c is the number of classes.Decision functions are used to define decision boundaries between classes and regions or between regions of each class.Therefore, the discriminant function is used to determine the class label of the unknown pattern based on comparing several discriminant functions c and assigning the maximum score of the unknown pattern to the class label.Therefore, the discriminant function will have the highest value in the region compared to the other discriminant functions 27,28 .

Neural network (NN)
Neural networks (NN) are derived from biological neural systems.These models, with their natural and intelligent structure and appropriate modeling of the neurons in the human brain, try to simulate the behavior of brain neurons through defined mathematical functions and synaptic function in natural neurons through the calculated weights in the communication lines of neurons are artificially modeled.The structure of an NN consists of input, output and hidden layers, communication weights and activation transfer functions.The input layer is a transmission layer and a means to prepare and introduce data; the output layer includes the values predicted by the network and the hidden layers, which consist of processor nodes and the place of data processing 29 .

Naive Bayes (NB)
NB is a probabilistic classifier using Bayesian theory in complete independence.For classification problems, the NB model is powerful and intuitive.NB's predictions are based on categories and Bayesian theory and assume that the predictors are conditionally independent.NB classifiers assume that the presence of one feature in a class is independent of the presence of another feature 30 .

Support vector machine (SVM)
SVM is a hybrid approach for reducing classification errors that combines estimation of convex hulls with differential error reduction.This loss reduction function evaluates unfavorable locations.SVM also uses the linear kernels as a tainted version of the Gaussian kernel to incorporate nonlinear maps of vector properties in ample space.SVM classification has a linear decision area, and while non-error core models have more flexible nonlinear decision-making contexts, linear SVM classifiers train errors faster than SVM models 31 .

Decision tree (DT)
The DTs are algorithms that generate decision rules based on the expected reduction in entropy when an element is sorted.They overstimulate data and have poor performance when applied to new datasets.For better results, they are frequently used in group contexts such as RFs 32 .

Random forest (RF)
A RF is a bag of DTs.Each DT is applied to a new training dataset obtained by random sampling, replacing the original dataset.In addition, some randomness is introduced into the decision tree construction: a subset of features is randomly selected for each decision branch of the DT.The RF prediction is given as the mean prediction of a single DT 33 .

K-nearest neighbor (KNN)
One of the classifiers used in this research is KNN.In this method, in the training stage, all samples in the input space are multidimensional vectors.This space is divided into category labels and the position of these points.Usually, the distance of the new sample to all the training samples is a suitable criterion to determine the category of the new and unknown sample.The distance of two samples is calculated as Euclidean, Manhattan, and Chebyshev.To determine the category of a new sample, the distance of this sample with all the samples stored in the memory is calculated, and the k samples with the smallest distance to the unknown sample are selected.The category label of most of these k samples is considered the category label for the unknown sample 34 .

Hyperparameter optimization
Bayesian Optimization Algorithm (BOA) is an effective method of general optimization of objective functions, the evaluation of which is costly 35  www.nature.com/scientificreports/with a uniform distribution over all possible solutions.Each iteration of the BOA consists of four steps: First, using one of the selection methods, promising answers are selected from the current population.In the second step, a Bayes network is built to describe the population of promising answers.In the third step, new candidate answers are generated through sampling from the Bayes network.In the fourth step, the new candidate's answers are added to the previous answers and replace all or some of them.The steps are repeated until a termination condition is reached.The termination condition can be convergence to a single member, reaching a sufficiently good solution, or reaching a certain number of iterations.There are different ways to perform each step of the BOA.For example, the initial population can be generated randomly or by using initial knowledge related to the problem.The selection stage can be done using any standard selection method in evolutionary algorithms.Also, different algorithms can be used to build the Bayes network, and different criteria can be used to evaluate the quality of candidate models.The ML model parameters optimized by the BOA are presented in Table 9.

Evaluation metrics
The ML algorithms have two phases: training and testing.During the training phase, a model was created to predict the state of other samples, and their performance was measured by a set of tests in the second phase.
In the testing phase, the goal is to evaluate the algorithm's performance from different aspects.The regression method has a set of data called training data that is pre-classified and has specific labels.The goal is to find a method, function or rule based on the characteristics of the training data to classify the data to be entered into the model in the future.The performance of all ML models was evaluated by MAE, RMSE and R 2 metrics 37 .
In these equations, y i andy i are predicted value and actual value, y ave is the average of data set values and n is the number of observations.
In the case of classification, after training and testing the ML model, the confusion matrix on the training and testing dataset is computed to obtain the different types of misclassifications (Fig. 5).A confusion matrix contains information about different accuracy and error types.The confusion matrix is a matrix that shows the successful or unsuccessful performance of a classifier model.Each column of the matrix shows a sample of the value predicted by the model, and each row contains real (correct) samples.Confusion matrices make it easy to observe the error and interference between the results and are used to estimate the desired performance.The

Figure 1 .
Figure 1.The regression results between the actual and predicted salinity stress values by NN model: (a) ReliefF algorithm over the train dataset, (b) ReliefF algorithm over the test dataset, (c) MRMR algorithm over the train dataset, (d) MRMR algorithm over the test dataset, (e) F-Test algorithm over the train dataset, (f) F-Test algorithm over the test dataset.

Figure 2 .
Figure 2. The regression results between the actual and predicted drought stress values by NN model: (a) ReliefF algorithm over the train dataset, (b) ReliefF algorithm over the test dataset, (c) MRMR algorithm over the train dataset, (d) MRMR algorithm over the test dataset, (e) F-Test algorithm over the train dataset, (f) F-Test algorithm over the test dataset.

Figure 3 .
Figure 3. Confusion matrixes of KNN classifier for salinity stress using phenotype and genotype features: (a) ReliefF algorithm over train dataset; (b) ReliefF algorithm over test dataset; (c) MRMR algorithm over train dataset; (d) MRMR algorithm over test dataset; (e) Chi2 algorithm over train dataset; (f) Chi2 algorithm over test dataset.

Figure 4 .
Figure 4. Confusion matrixes of KNN classifier for drought stress using Phenotype and Genotype Features: (a) ReliefF algorithm over train dataset; (b) ReliefF algorithm over test dataset; (c) MRMR algorithm over train dataset; (d) MRMR algorithm over test dataset; (e) Chi2 algorithm over train dataset; (f) Chi2 algorithm over test dataset.
rolling in the morning Dissipation of leaf tip dryness by a quarter in three leaves of the plant 3 Moderately Susceptible Partially ruling and no ruling in the morning and evening Drying of half of the young leaves and all the lower leaves 5 Susceptible Fully rolling and no rolling in the morning The dryness of the leaves spread to three-quarters of the leaves 7 Highly Susceptible Like the roll and the rolling in the morning Drought spread to all leaves 9

Table 1 .
Results of prediction of salinity and drought stresses using phenotype and genotype features.

Table 2 .
Results of salinity and drought stress prediction using combination of phenotype and genotype features.

Table 3
also shows the results of salinity stress classification using genotype features.The results indicate that the ReliefF FS algorithm and the KNN model outperform other models with accuracy, precision and F1 score equal to 0.97, 0.98 and 0.97, in the training phase and 0.89, 0.89, and 0.89 in the testing phase respectively.

Table 3 .
Results of classification of salinity stress tolerance.

Table 4 .
Results of classification of drought stress tolerance.

Table 5 .
Performance metrics for classification of Salinity and drought stress tolerance using KNN classifier.

Table 7 .
Instructions for drought tolerance.

Table 8 .
Instructions for salinity stress tolerance.Kira and Rendell formulated the original Relief algorithm inspired by learning by example Vol:.(1234567890) Scientific Reports | (2024) 14:17420 | https://doi.org/10.1038/s41598-024-68392-wwww.nature.com/scientificreports/ . The latent function reflects the statistical nature of the model and is determined by the kernel of the variance function.GPR models can provide accurate estimates with confidence intervals at any spatial point, capturing model predictions' uncertainties.The parser can also choose individual base features to preview and specify the model's appearance.Building and optimizing GPR models is a task that is doable with today's highperformance computing capabilities parameters

Table 9 .
HyperParameters of ML models optimized by bayesian optimization algorithm.